Feature Extraction and Feature Selection: Reducing Data Complexity With Apache Spark
نویسندگان
چکیده
منابع مشابه
Feature Extraction and Feature Selection: Reducing Data Complexity with Apache Spark
Feature extraction and feature selection are the first tasks in pre-processing of input logs in order to detect cyber security threats and attacks while utilizing machine learning. When it comes to the analysis of heterogeneous data derived from different sources, these tasks are found to be time-consuming and difficult to be managed efficiently. In this paper, we present an approach for handli...
متن کاملAn Information Theoretic Feature Selection Framework for Big Data under Apache Spark
With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be paralleli...
متن کاملFeature Selection and Non-linear Feature Extraction
Feature extraction and feature selection are two important tasks in pattern recognition. Classiication algorithms like k-nearest neighbors, which are based on the assumption that patterns in the same class are close to each other and those in diierent classes are far apart (locality property), rely heavily on the quality of the features extracted from the input data. In this work, an objective ...
متن کاملMassively Parallel Unsupervised Feature Selection on Spark
High dimensional data sets pose important challenges such as the curse of dimensionality and increased computational costs. Dimensionality reduction is therefore a crucial step for most data mining applications. Feature selection techniques allow us to achieve said reduction. However, it is nowadays common to deal with huge data sets, and most existing feature selection algorithms are designed ...
متن کاملA Real-Time Electroencephalography Classification in Emotion Assessment Based on Synthetic Statistical-Frequency Feature Extraction and Feature Selection
Purpose: To assess three main emotions (happy, sad and calm) by various classifiers, using appropriate feature extraction and feature selection. Materials and Methods: In this study a combination of Power Spectral Density and a series of statistical features are proposed as statistical-frequency features. Next, a feature selection method from pattern recognition (PR) Tools is presented to e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: SSRN Electronic Journal
سال: 2017
ISSN: 1556-5068
DOI: 10.2139/ssrn.3432178